SI649 W23 Altair Homework #4¶

Overview¶

We'll focus on maps and cartrographic visualization. In this lab, you will practice:

  • Point Maps
  • Symbol Maps
  • Choropleth maps
  • Interactions with maps

After building these charts, you will make a website with these charts using streamlit.

Lab Instructions¶

  • Save, rename, and submit the ipynb file (use your username in the name).
  • Complete all the checkpoints, to create the required visualization at each cell.
  • Run every cell (do Runtime -> Restart and run all to make sure you have a clean working version), print to pdf, submit the pdf file.
  • If you end up stuck, show us your work by including links (URLs) that you have searched for. You'll get partial credit for showing your work in progress.
In [1]:
import pandas as pd
import altair as alt
from vega_datasets import data

alt.data_transformers.disable_max_rows()

df = pd.read_csv('https://raw.githubusercontent.com/pratik-mangtani/si649-hw/main/airports.csv')
url = "https://raw.githubusercontent.com/pratik-mangtani/si649-hw/main/small-airports.json"

Visualization 1: Dot Density Map¶

vis1 Description of the visualization:

We want to visualize the density of small airports in the world. Each small airport is represented by a dot. The visualization has two layers:

  • The base layer shows the outline of the world map.
  • The point map shows different small airports.
  • The tooltip shows the name of the airport.

Hint:

  • How can we show continents on the map? Which object can be used from the json dataset ?
  • How can we show only small airports on the map?
In [2]:
df
Out[2]:
id ident type name latitude_deg longitude_deg elevation_ft continent iso_country iso_region municipality scheduled_service gps_code iata_code local_code home_link wikipedia_link keywords
0 6523 00A heliport Total Rf Heliport 40.070801 -74.933601 11.0 NaN US US-PA Bensalem no 00A NaN 00A NaN NaN NaN
1 323361 00AA small_airport Aero B Ranch Airport 38.704022 -101.473911 3435.0 NaN US US-KS Leoti no 00AA NaN 00AA NaN NaN NaN
2 6524 00AK small_airport Lowell Field 59.947733 -151.692524 450.0 NaN US US-AK Anchor Point no 00AK NaN 00AK NaN NaN NaN
3 6525 00AL small_airport Epps Airpark 34.864799 -86.770302 820.0 NaN US US-AL Harvest no 00AL NaN 00AL NaN NaN NaN
4 506791 00AN small_airport Katmai Lodge Airport 59.093287 -156.456699 80.0 NaN US US-AK King Salmon no 00AN NaN 00AN NaN NaN NaN
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
74751 46378 ZZ-0001 heliport Sealand Helipad 51.894444 1.482500 40.0 EU GB GB-ENG Sealand no NaN NaN NaN http://www.sealandgov.org/ https://en.wikipedia.org/wiki/Principality_of_... Roughs Tower Helipad
74752 307326 ZZ-0002 small_airport Glorioso Islands Airstrip -11.584278 47.296389 11.0 AF TF TF-U-A Grande Glorieuse no NaN NaN NaN NaN NaN NaN
74753 346788 ZZ-0003 small_airport Fainting Goat Airport 32.110587 -97.356312 690.0 NaN US US-TX Blum no 87TX NaN 87TX NaN NaN NaN
74754 342102 ZZZW closed Scandium City Heliport 69.355287 -138.939310 4.0 NaN CA CA-YT (Old) Scandium City no NaN NaN NaN NaN NaN ZZZW, ZZZW, ZYW, YK96
74755 313629 ZZZZ small_airport Satsuma Iōjima Airport 30.784722 130.270556 338.0 AS JP JP-46 Mishima no RJX7 NaN RJX7 NaN http://wikimapia.org/6705190/Satsuma-Iwo-jima-... SATSUMA,IWOJIMA,RJX7

74756 rows × 18 columns

In [3]:
small_airport = df[df['type'] == "small_airport"]
In [4]:
world = data.world_110m.url
In [5]:
land = alt.Chart(alt.topo_feature(world, 'countries')).transform_filter(alt.datum.id != 10).mark_geoshape(
   fill='lightgray'
).project(
   type='mercator'
)
In [6]:
# TODO: Vis 1
sa = alt.Chart(small_airport).mark_circle().encode(
   longitude='longitude_deg:Q',
   latitude='latitude_deg:Q',
   size=alt.value(3),
   color=alt.value('red'),
   tooltip='name'
).project(
   "mercator"
).properties(
   width=500,
   height=400
)
map = alt.layer(land, sa).properties(
   width=600,
   height=400
)
map
Out[6]:

Visualization 2: Propotional Symbol¶

vis2 Description of the visualization:

The visualization shows faceted maps pointing the 20 most populous cities in the world by 2100. There are two layers in faceted charts:

  • The base layer shows the map of countries.
  • The second layer shows size encoded points indicating the population of those countries.
  • Tooltip shows city name and population.

Hint:

  • Which projection has been used in individual charts?
  • How to create a faceted chart with different years and 2 columns?
In [7]:
countries_url = data.world_110m.url
source = 'https://raw.githubusercontent.com/pratik-mangtani/si649-hw/main/population_prediction.csv'
In [8]:
df = pd.read_csv(source)
In [9]:
df
Out[9]:
city population year lat lon
0 Tokyo 36.1 2010 35.7 139.7
1 Tokyo 36.4 2025 35.7 139.7
2 Tokyo 32.6 2050 35.7 139.7
3 Tokyo 28.9 2075 35.7 139.7
4 Mexico 20.1 2010 23.6 -102.6
... ... ... ... ... ...
95 Lilongwe 41.4 2100 -14.0 33.8
96 Blantyre City 40.9 2100 -15.8 35.0
97 Kampala 40.1 2100 0.3 32.6
98 Lusaka 37.7 2100 -15.4 28.3
99 Mogadishu 36.4 2100 2.0 45.3

100 rows × 5 columns

In [10]:
# TODO: Vis 2
years = [2010, 2025, 2050, 2075, 2100]
land = alt.Chart(alt.topo_feature(world, 'countries')).mark_geoshape(
   fill='lightgray'
).project(
   type='equirectangular'
)
figures = []
for year in years:
   temp_df = df[df['year'] == year]
   temp = alt.Chart(temp_df).mark_circle().encode(
      longitude='lon:Q',
      latitude='lat:Q',
      size=alt.Size('population:Q', title='Population(million)'),
      color=alt.value('green'),
      tooltip=['city', 'population']
   ).project(
      'equirectangular'
   ).properties(title=str(year))
   map = alt.layer(land, temp)
   figures.append(map)
figures = tuple(figures)

print(figures)
((figures[0]| figures[1]) & (figures[2] | figures[3]) & (figures[4])).properties(title="The 20 Most Populous Cities in the World by 2100")
(alt.LayerChart(...), alt.LayerChart(...), alt.LayerChart(...), alt.LayerChart(...), alt.LayerChart(...))
Out[10]:

Visualization 3: Hurricane Trajectories¶

vis3 Description of the visualization:

Create a map that shows the paths (trajectories) of the 2017 hurricanes. Filter the data so that only 2017 hurricanes are shown. Remove Alaska and Hawaii from the map (Filter out ids 2 and 15).

Hint:

  • How will you filter out 2017 hurricanes?
  • Which object can be used to show state boundaries?
In [11]:
states_url = data.us_10m.url
hurricane_data = pd.read_csv('https://raw.githubusercontent.com/pratik-mangtani/si649-hw/main/hurdat2.csv')
hurricane_data.sample(3)
Out[11]:
identifier name num_pts record_id status latitude longitude max_wind min_pressure datetime
27249 AL031965 BETSY 69 NaN HU 25.1 -80.7 110 952 1965-09-08T12:00:00
31449 AL141974 FIFI 33 NaN TD 16.5 -71.0 30 -999 1974-09-15T12:00:00
12914 AL061915 UNNAMED 42 L HU 29.0 -90.3 110 944 1915-09-29T18:00:00
In [12]:
hurricane_data.head(17)
Out[12]:
identifier name num_pts record_id status latitude longitude max_wind min_pressure datetime
0 AL011851 UNNAMED 14 NaN HU 28.0 -94.8 80 -999 1851-06-25T00:00:00
1 AL011851 UNNAMED 14 NaN HU 28.0 -95.4 80 -999 1851-06-25T06:00:00
2 AL011851 UNNAMED 14 NaN HU 28.0 -96.0 80 -999 1851-06-25T12:00:00
3 AL011851 UNNAMED 14 NaN HU 28.1 -96.5 80 -999 1851-06-25T18:00:00
4 AL011851 UNNAMED 14 L HU 28.2 -96.8 80 -999 1851-06-25T21:00:00
5 AL011851 UNNAMED 14 NaN HU 28.2 -97.0 70 -999 1851-06-26T00:00:00
6 AL011851 UNNAMED 14 NaN TS 28.3 -97.6 60 -999 1851-06-26T06:00:00
7 AL011851 UNNAMED 14 NaN TS 28.4 -98.3 60 -999 1851-06-26T12:00:00
8 AL011851 UNNAMED 14 NaN TS 28.6 -98.9 50 -999 1851-06-26T18:00:00
9 AL011851 UNNAMED 14 NaN TS 29.0 -99.4 50 -999 1851-06-27T00:00:00
10 AL011851 UNNAMED 14 NaN TS 29.5 -99.8 40 -999 1851-06-27T06:00:00
11 AL011851 UNNAMED 14 NaN TS 30.0 -100.0 40 -999 1851-06-27T12:00:00
12 AL011851 UNNAMED 14 NaN TS 30.5 -100.1 40 -999 1851-06-27T18:00:00
13 AL011851 UNNAMED 14 NaN TS 31.0 -100.2 40 -999 1851-06-28T00:00:00
14 AL021851 UNNAMED 1 NaN HU 22.2 -97.6 80 -999 1851-07-05T12:00:00
15 AL031851 UNNAMED 1 NaN TS 12.0 -60.0 50 -999 1851-07-10T12:00:00
16 AL041851 UNNAMED 49 NaN TS 13.4 -48.0 40 -999 1851-08-16T00:00:00
In [13]:
hurricane_data['datetime'] = hurricane_data['datetime'].apply(pd.to_datetime)
In [14]:
type(hurricane_data.datetime.iloc[0])
Out[14]:
pandas._libs.tslibs.timestamps.Timestamp
In [15]:
h2017 = hurricane_data[hurricane_data['datetime'].dt.year == 2017]
In [16]:
#TODO: Vis 3
states = alt.topo_feature(data.us_10m.url, 'states')
state = alt.Chart(states).transform_filter((alt.datum.id != 2) & (alt.datum.id != 15)).mark_geoshape(
    fill='white',
    stroke='black'
).properties(
    width=500,
    height=300
)

hurricane = alt.Chart(h2017).mark_line().encode(
   longitude='longitude:Q',
   latitude='latitude:Q',
   size=alt.value(1),
   color=alt.value('blue'),
   tooltip='name'
).properties(
    width=500,
    height=300
)
map = alt.layer(state, hurricane).properties(
    width=500,
    height=300
)
map
Out[16]:

Visualization 4: Choropleth Map¶

vis4

Interaction

vis4

Description of the visualization:

The visualization has a choropleth map showing the population of different states and a sorted bar chart showing the top 15 states by population. These charts are connected using a click interaction.

Hint

  • Which object can be used to show states on the map?
  • Which transform can be used to add population data to the geographic data? How can we combine two datasets in Altair?
In [17]:
state_map = data.us_10m.url
state_pop = data.population_engineers_hurricanes()[['state', 'id', 'population']]
state_pop.sample(5)
state_15_sorted = state_pop.sort_values(by='population', ascending=True)
state_15_sorted = state_15_sorted.tail(15)
state_15_sorted
Out[17]:
state id population
21 Massachusetts 25 6811779
2 Arizona 4 6931071
47 Washington 53 7288000
46 Virginia 51 8411808
30 New Jersey 34 8944469
22 Michigan 26 9928300
33 North Carolina 37 10146788
10 Georgia 13 10310371
35 Ohio 39 11614373
38 Pennsylvania 42 12784227
13 Illinois 17 12801539
32 New York 36 19745289
9 Florida 12 20612439
43 Texas 48 27862596
4 California 6 39250017
In [18]:
selection = alt.selection_single(fields=['state'])
opacityCondition = alt.condition(selection, alt.value(1), alt.value(0.6))
In [19]:
# TODO: Vis 4 


states = alt.topo_feature(data.us_10m.url, 'states')

variable_list = ['population', 'engineers', 'hurricanes']

map = alt.Chart(states).mark_geoshape().encode(
    color='population:Q',
    # opacity=opacityCondition,
    tooltip=['state:N', 'population:Q']
).transform_lookup(
    lookup='id',
    from_=alt.LookupData(state_pop, 'id', list(state_pop.columns))
).properties(
    width=500,
    height=300
).project(
    type='albersUsa'
)
# .add_selection(selection)
In [20]:
bars = alt.Chart(state_15_sorted,width=500, title="Top 15 states by Population").mark_bar().encode(
    alt.X('population:Q',title=""),
    alt.Y('state:N', sort='x', title=""), 
    # opacity=opacityCondition,
    color = alt.Color('population:Q'))
# .add_selection(selection)
In [21]:
# map | bars
In [22]:
imap = map.encode(opacity=opacityCondition).add_selection(selection)
ibars = bars.encode(opacity=opacityCondition).add_selection(selection)
(imap | ibars).configure_view(strokeWidth=0)
Out[22]:
In [ ]: